Introduction

Here I use the random forest method to extract the power of different variables to explain growth, mortality and recruitment of tree species. I use the forest inventory data base for the eastern North American forest.

For each tree species in the data base (total of 34), we will run a random forest for growth, mortality and recruitment with different parameters and explanatory variables. Then we evaluate the simulation based on the R\(^2\) for the growth and recruitment, and the out-of-bag estimate (1 - OOB) for mortality. For mortality, in which alive events are much more frequent than dead events, we avoid overestimating OOB by balancing the sampling weights to have the same number of observations from each class as suggested in Janitza & Hornung (2018). We further check which variables have a better explanatory power over the response variable.

Simulations

For each tree species, we ran a random forest varying (i) the explanatory variables, and (ii) the number of variables to possibly split at in each node (mtry). The set of variables for growth and mortaltiy are the following:

##  [1] "growth"                      "dbh0"                       
##  [3] "height"                      "canopyDistance"             
##  [5] "latitude"                    "longitude"                  
##  [7] "mean_temp_period_3_lag"      "min_temp_coldest_period_lag"
##  [9] "min_extreme_temp"            "tot_annual_pp_lag"          
## [11] "tot_pp_period3_lag"          "org_db_loc"

And for recruitment, in which is quantified at the \(plot\) and \(species\) level, the set of variables are:

## [1] "plot_id"                "deltaYear"              "longitude"             
## [4] "min_extreme_temp_mLag"  "tot_annual_pp_lag_mLag" "s_star_mLag"           
## [7] "relativeBA_sp_mLag"     "org_db_loc"

Variable importance

Summary of all species

Assuming the set of variables var1 was the best to explain recruitment and mortality, let’s see the importance of the variables present in the set var1:

What was the best predictor for each species?

What was the best predictor for each group of shade torence?

Species grouped by tolerance to shade: High, Medium, and Low tolerance to shade.

What was the best predictor for each biome?

Species grouped by biome: Boreal and Temperate.

Individual species response

Performance by individual species

Which species perform the better?

Performance by shade tolerance

Performance by biome

Sample size effect

Is there a correlation between explanatory power and sample size?